Introduction to R and R studio

Scatterplot with birth data:

require(mosaic)
require(mosaicData)
xyplot(births ~ dayofyear, data = Births78)

Other commands we’ve run during the workshop day 1 and some comments on them.

histogram(~ age, data = HELPrct) #histogram of age

favstats(~ age, data = HELPrct) #My favorite statistics; it ignores NAs
##  min Q1 median Q3 max     mean       sd   n missing
##   19 30     35 40  60 35.65342 7.710266 453       0
tally(~sex, data = HELPrct) #Count of gender
## 
## female   male 
##    107    346
tally(~sex, format = "percent", data = HELPrct) #Percents of gender
## 
##   female     male 
## 23.62031 76.37969
tally(~sex, format = "proportion", data = HELPrct) #Proportions of gender
## 
##    female      male 
## 0.2362031 0.7637969
tally(~substance, data = HELPrct) #Count of substance
## 
## alcohol cocaine  heroin 
##     177     152     124
tally(~substance, format = "perc", data = HELPrct) #Percents of substance
## 
##  alcohol  cocaine   heroin 
## 39.07285 33.55408 27.37307
tally(sex ~ substance, data = HELPrct) #Cross-tab sex & substance
##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94
tally(~ sex + substance, data = HELPrct) #Ditto, just a different format
##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94

Also, always remember the mplot command for producing graphs and clicking on the ‘Show Expression’.

Data Structures and Tidy Data

Consider cases and variables

In tidy data:

All of this goes into a codebook only.

Visualizing Data

DTK: * Glyphs are marks; data glyps are also marks and the features of the glyphs encode the values of the variables & the visual properties are aesthetics. The choices that we make as experts in our field is the choice of aesthetic to map to variables. The word ‘aesthetic’ here is taken from its early When we make data glyps we map variables to aesthetics. A scale is for a computer and a guide is for a person (about the aesthetics). A legend (beside a graph) is an example of a guide. A data table is glyph-ready when there is one row for each glyp to be drawn (I could say, “that’s x-position, that’s y-position, that’s color, that’s size.” Glyph-ready data are tidy data, but tidy data are not necessarily glyph-ready. *Sometimes glyphs represent the collective properties of variables, e.g. in the case of histograms.

RP:

require(lubridate)
data(Births78)
head(Births78, 3)
##         date births dayofyear
## 1 1978-01-01   7701         1
## 2 1978-01-02   7527         2
## 3 1978-01-03   8825         3
ggplot(data = Births78, aes(x = date, y = births)) + geom_point()

But, we need to add days of the week because that’s more useful to us:

Births78 <- 
  Births78 %>% 
  mutate(wday = wday(date, label = TRUE))
  ggplot(data = Births78, aes(x = date, y = births, color = wday)) + geom_point() 

Note, that the same graph would be generated by the following:

  ggplot(data = Births78) + geom_point(aes(x = date, y = births, color = wday)) 

We could change this to a line graph:

 ggplot(data = Births78) + geom_line(aes(x = date, y = births, color = wday)) 

Or we could have points and lines; note that we have moved the aes commands into the ggplot command because it applies to all the layers.

 ggplot(data = Births78, aes(x = date, y = births, color = wday)) + geom_line() + geom_point()

We could have put the data outside the ggplot command using the magrittr pipe:

 Births78 %>%
  ggplot(aes(x = date, y = births, color = wday)) + geom_line() + geom_point()

We need to do setting rather than mapping for things like colors; inside the individual geom you set the color you want (different to ggvis where setting is done with :=). Within ggplot you can only map nor set.

 Births78 %>%
  ggplot(aes(x = date, y = births)) + geom_point(color = "navy")

Combine the colored lines with navy points - notice that wday is in the aes in the geom_line whereas with geom_point we don’t call on aes because we’re simply setting (not mapping).

 Births78 %>%
  ggplot(aes(x = date, y = births)) + 
    geom_line(aes(color = wday)) + 
    geom_point(color = "navy")

Recall that we can check out the kinds of geoms that exist if we use the command apropos, notice the use of the caret (^)

apropos("^geom")
##  [1] "geom_abline"     "geom_area"       "geom_bar"       
##  [4] "geom_bin2d"      "geom_blank"      "geom_boxplot"   
##  [7] "geom_contour"    "geom_crossbar"   "geom_density"   
## [10] "geom_density2d"  "geom_dotplot"    "geom_errorbar"  
## [13] "geom_errorbarh"  "geom_freqpoly"   "geom_hex"       
## [16] "geom_histogram"  "geom_hline"      "geom_jitter"    
## [19] "geom_line"       "geom_linerange"  "geom_map"       
## [22] "geom_path"       "geom_point"      "geom_pointrange"
## [25] "geom_polygon"    "geom_quantile"   "geom_raster"    
## [28] "geom_rect"       "geom_ribbon"     "geom_rug"       
## [31] "geom_segment"    "geom_smooth"     "geom_step"      
## [34] "geom_text"       "geom_tile"       "geom_violin"    
## [37] "geom_vline"
HELPrct %>% 
ggplot(aes(x = substance)) + 
geom_bar()

Notice that we were able to construct the bar chart even though our data weren’t glyph-ready with counts; ggplot did it for us.

HELPrct %>% 
    ggplot(aes(x = age)) + 
    geom_histogram(binwidth = 2)

* We also often want to use frequency polygons or kernel density functions:

HELPrct %>% 
ggplot(aes(x = age)) + 
    geom_freqpoly(binwidth = 2)

HELPrct %>% 
ggplot(aes(x = age)) + 
    geom_density()

But, note RP prefers to add density to a line plot because it look

HELPrct %>% 
ggplot(aes(x = age)) + 
geom_line(stat = "density")

Or we could have put the geom in the stat_density

HELPrct %>%
ggplot(aes(x=age)) +
stat_density( geom="line")
## ymax not defined: adjusting position using y instead

Now generate your own graph looking at the average consumption of graphs (which I did by groups)

HELPrct %>% 
    ggplot(aes(x = i1)) + 
    geom_line(stat = "density", aes(color = factor(substance)))

Data Wrangling

There are 5 kinds of things (objects) in R:

  1. data tables (aka dataframes), our convention is for first letter upper-case, e.g. BabyNames
  2. variables, i.e. the stuff in the tables,our convention is for first letter lower-case, e.g. year
  3. scalars, e.g. "Treatment" or 42 (thanks DA)
  4. functions & their arguments: e.g. positional arguments, named arguments (parentheses follow names of things if and only if those things are functions, e.g. sd(); multiple arguments separated with commas)
  5. pipes: arrange sequences, %>% takes something and puts it as the/an input to a function, e.g. something >%> function() rather than function(something, other stuff, ...).

3 Main kinds of functions for Wrangling

  1. Data verb takes a data table as input, does something & gives a data table as output
  2. Transformation function takes a variable and transforms it in some way; takes a variable as input and produces a variable as output (a la Sraffa?)
  3. Reduction function takes a bunch of variables and turns them into a single variable, e.g. the mean, max, standard deviation, etc

To run the following you may need to install the packages NHANES, dplyr, and babynames. Run the command, e.g. install.packages("NHANES"). Remember to require these packages.

Now we run some commands to start wrangling the data.

require(NHANES)
require(dplyr)
require(babynames)
NHANESmajors <- filter(NHANES, Age >= 21)
nowsmoking <- select(NHANES, Age, SmokeNow)
nowsmoking %>% 
  head()
## Source: local data frame [6 x 2]
## 
##   Age SmokeNow
## 1  34       No
## 2  34       No
## 3  34       No
## 4   4       NA
## 5  49      Yes
## 6   9       NA
group_by(babynames, name, year)  
## Source: local data frame [1,792,091 x 5]
## Groups: name, year
## 
##    year sex      name    n       prop
## 1  1880   F      Mary 7065 0.07238359
## 2  1880   F      Anna 2604 0.02667896
## 3  1880   F      Emma 2003 0.02052149
## 4  1880   F Elizabeth 1939 0.01986579
## 5  1880   F    Minnie 1746 0.01788843
## 6  1880   F  Margaret 1578 0.01616720
## 7  1880   F       Ida 1472 0.01508119
## 8  1880   F     Alice 1414 0.01448696
## 9  1880   F    Bertha 1320 0.01352390
## 10 1880   F     Sarah 1288 0.01319605
## ..  ... ...       ...  ...        ...
head(KidsFeet)
##     name birthmonth birthyear length width sex biggerfoot domhand
## 1  David          5        88   24.4   8.4   B          L       R
## 2   Lars         10        87   25.4   8.8   B          L       L
## 3   Zach         12        87   24.5   9.7   B          R       R
## 4   Josh          1        88   25.2   9.8   B          L       R
## 5   Lang          2        88   25.1   8.9   B          L       R
## 6 Scotty          3        88   25.7   9.7   B          R       R
select(KidsFeet, birthyear, domhand) %>% 
  head()
##   birthyear domhand
## 1        88       R
## 2        87       L
## 3        87       R
## 4        88       R
## 5        88       R
## 6        88       R
righties <- filter(KidsFeet, domhand == "R")
head(righties)
##     name birthmonth birthyear length width sex biggerfoot domhand
## 1  David          5        88   24.4   8.4   B          L       R
## 2   Zach         12        87   24.5   9.7   B          R       R
## 3   Josh          1        88   25.2   9.8   B          L       R
## 4   Lang          2        88   25.1   8.9   B          L       R
## 5 Scotty          3        88   25.7   9.7   B          R       R
## 6 Edward          2        88   26.1   9.6   B          L       R

The pipe (%>%) makes it clear what the individual lines are doing.

KidsFeetArea <- KidsFeet %>% 
  mutate(area = length * width) %>% 
  head()

arrange orders your cases in terms of the variable you select in ascending order, e.g. arrange(area).

KidsFeet %>% 
  mutate(area = length * width) %>% 
  arrange(area) %>% 
  head()
##      name birthmonth birthyear length width sex biggerfoot domhand   area
## 1  Hayley          1        88   21.6   7.9   G          R       R 170.64
## 2    Kate          4        88   23.7   7.9   G          R       R 187.23
## 3 Caitlin          7        88   22.5   8.6   G          R       R 193.50
## 4  Hannah          3        88   22.9   8.5   G          L       R 194.65
## 5   Peggy         10        88   24.2   8.1   G          L       R 196.02
## 6   Laura          9        88   24.0   8.3   G          R       L 199.20

To arrange in descending order, you can add desc() within arrange.

KidsFeet %>% 
  mutate(area = length * width) %>% 
  arrange(desc(area)) %>% 
  head()
##     name birthmonth birthyear length width sex biggerfoot domhand   area
## 1   Mark          9        87   27.5   9.8   B          R       R 269.50
## 2    Cam          3        88   27.0   9.8   B          L       R 264.60
## 3   Glen          7        88   27.1   9.4   B          L       R 254.74
## 4 Edward          2        88   26.1   9.6   B          L       R 250.56
## 5 Scotty          3        88   25.7   9.7   B          R       R 249.29
## 6   Abby          2        88   26.1   9.5   G          L       R 247.95

We can also add another argument to arrange to break ties, e.g. adding birthmonth (I’m going to head for 10 cases):

KidsFeet %>% 
  mutate(area = length * width) %>% 
  arrange(area, desc(birthmonth)) %>% 
  head(10)
##        name birthmonth birthyear length width sex biggerfoot domhand
## 1    Hayley          1        88   21.6   7.9   G          R       R
## 2      Kate          4        88   23.7   7.9   G          R       R
## 3   Caitlin          7        88   22.5   8.6   G          R       R
## 4    Hannah          3        88   22.9   8.5   G          L       R
## 5     Peggy         10        88   24.2   8.1   G          L       R
## 6     Laura          9        88   24.0   8.3   G          R       L
## 7     Damon          9        88   22.9   8.8   B          R       L
## 8   Caitlin          6        88   23.0   8.8   G          L       R
## 9     David          5        88   24.4   8.4   B          L       R
## 10 Caroline         12        87   24.0   8.7   G          R       L
##      area
## 1  170.64
## 2  187.23
## 3  193.50
## 4  194.65
## 5  196.02
## 6  199.20
## 7  201.52
## 8  202.40
## 9  204.96
## 10 208.80

The Verb summarise creates a row with summary statistics that you select by defining them in functions with particular labels.

NewThing <- KidsFeet %>% 
  mutate(area = length * width)
AnotherNewThing <- NewThing %>% 
  summarise(ave_area = mean(area))
head(NewThing)
##     name birthmonth birthyear length width sex biggerfoot domhand   area
## 1  David          5        88   24.4   8.4   B          L       R 204.96
## 2   Lars         10        87   25.4   8.8   B          L       L 223.52
## 3   Zach         12        87   24.5   9.7   B          R       R 237.65
## 4   Josh          1        88   25.2   9.8   B          L       R 246.96
## 5   Lang          2        88   25.1   8.9   B          L       R 223.39
## 6 Scotty          3        88   25.7   9.7   B          R       R 249.29
head(AnotherNewThing)
##   ave_area
## 1 222.7369

The important feature of summarise is that it’ll create one row with a variety of variables that you specify, i.e. other means, say the max of the variable, e.g.

AnotherNewThing <- NewThing %>% 
  summarise(ave_area = mean(area), max_area = max(area))
AnotherNewThing
##   ave_area max_area
## 1 222.7369    269.5

We have looked at 5 commands so far:

  1. filter
  2. select
  3. mutate
  4. arrange
  5. summarise

We shall add another, more complex command, group_by.

Let us

NewThing %>% 
  group_by(sex) %>%
  summarise(ave_area = mean(area), max_area = max(area))
## Source: local data frame [2 x 3]
## 
##   sex ave_area max_area
## 1   B 231.0140   269.50
## 2   G 214.0242   247.95

Or we could just use KidsFeet rather than assigning things to dataframes like we did with NewThing.

KidsFeet %>% 
  mutate(area = length * width) %>% 
  group_by(sex) %>%
  summarise(ave_area = mean(area), max_area = max(area))
## Source: local data frame [2 x 3]
## 
##   sex ave_area max_area
## 1   B 231.0140   269.50
## 2   G 214.0242   247.95

Or we could arrange by both dominant hand and birth year:

KidsFeet %>% 
  mutate(area = length * width) %>% 
  group_by(domhand, birthyear) %>%
  summarise(ave_area = mean(area), max_area = max(area))
## Source: local data frame [4 x 4]
## Groups: domhand
## 
##   domhand birthyear ave_area max_area
## 1       L        87 216.1600   223.52
## 2       L        88 214.6850   240.30
## 3       R        87 245.7420   269.50
## 4       R        88 220.6769   264.60

It’s important to remain aware of the change in the cases when you use group_by. 1. You have changed the cases from the cases before group_by, e.g. the kids, and then you create a new case, e.g. a group of kids. 2. Not all the variables in the original data will appear in the output of group_by, the ones you group by and the ones you create will be there.

NHANES %>% favstats(BMI ~ Gender, data = .) #This works because I tell it to be put where the period is
##   Gender   min    Q1 median   Q3   max     mean       sd    n missing
## 1 female 12.88 21.24  25.56 31.2 81.25 26.77208 7.898886 4841     179
## 2   male 12.89 22.10  26.31 30.6 63.91 26.54707 6.807454 4793     187
mean(BMI ~ Gender, na.rm = TRUE, data = NHANES) #We have to tell it to ignore NAs
##   female     male 
## 26.77208 26.54707
favstats(BMI ~ Gender, data = NHANES) #Ignores NAs
##   Gender   min    Q1 median   Q3   max     mean       sd    n missing
## 1 female 12.88 21.24  25.56 31.2 81.25 26.77208 7.898886 4841     179
## 2   male 12.89 22.10  26.31 30.6 63.91 26.54707 6.807454 4793     187
NHANES %>% 
  group_by(Gender) %>%
  summarise(mean_bmi = mean(BMI, na.rm= TRUE), count = n())
## Source: local data frame [2 x 3]
## 
##   Gender mean_bmi count
## 1 female 26.77208  5020
## 2   male 26.54707  4980

What happens if we only want to look at 20-year-olds?

NHANES %>% 
  filter(Age == 20) %>%
  group_by(Gender) %>%
  summarise(mean_bmi = mean(BMI, na.rm= TRUE), count = n())
## Source: local data frame [2 x 3]
## 
##   Gender mean_bmi count
## 1 female 24.46838    68
## 2   male 25.61575    73

Data Wrangling Cont. (Weds Morning)

Some functions we used: *Sample_n() takes a random sample, like we did with the Babynames with NH.

What about functions that take two tables as input and produce one table as output? We often think about this as joining or merging data, and we can use a variety of commands to do this. *But we should keep this separate from the idea of concatenation, which adds rows to an already existing data table.

Joining data * Inner Join Take the cases from the left data table (what I would think of as the ‘Master Data’ from Stata) and join it to the cases on the right data (what I would think of as the ‘merging data’)

By default, it looks for a variable and uses that as the default. BB recommends that we always use the by condition.

[ME: Is there a way to generate unique identifiers algorithmically? That is, the thing I spent most of my time on was generating unique identifiers that worked because I have experience generating unique IDs.]

grades <- read.csv("http://tiny.cc/mosaic/grades.csv",
                   stringsAsFactors = FALSE)
courses <- read.csv("http://tiny.cc/mosaic/courses.csv",
                    stringsAsFactors = FALSE)
grade_to_number <- read.csv("http://tiny.cc/mosaic/grade-to-number.csv",
                            stringsAsFactors = FALSE)

We could join these if we wanted to.

Randy - how to gather data

Randy’s notes are here.

We want: a column that says country a column that says year, and *a colum that has the proportion of HIV

We want to re-shape data (as with reshape in Stata).

We therefore need a way to convert data from wide into long (or long into wide)

To go from wide to narrow (long) we

HIVdata <- read.csv("http://dtkaplan.github.io/CVC/Summer2015/Learn/TidyData/HIV.csv", stringsAsFactors = FALSE)
head(HIVdata)
##   Estimated.HIV.Prevalence.....Ages.15.49. X1979 X1980 X1981 X1982 X1983
## 1                                 Abkhazia    NA    NA    NA    NA    NA
## 2                              Afghanistan    NA    NA    NA    NA    NA
## 3                    Akrotiri and Dhekelia    NA    NA    NA    NA    NA
## 4                                  Albania    NA    NA    NA    NA    NA
## 5                                  Algeria    NA    NA    NA    NA    NA
## 6                           American Samoa    NA    NA    NA    NA    NA
##   X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 5    NA    NA    NA    NA    NA    NA  0.06  0.06  0.06  0.06  0.06  0.06
## 6    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##   X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 5  0.06  0.06  0.06  0.06  0.06  0.06  0.06  0.06   0.1   0.1   0.1   0.1
## 6    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##   X2008 X2009 X2010 X2011
## 1    NA    NA    NA    NA
## 2    NA  0.06  0.06  0.06
## 3    NA    NA    NA    NA
## 4    NA    NA    NA    NA
## 5   0.1    NA    NA    NA
## 6    NA    NA    NA    NA

We need to alter this to get rid of the terrible initial name:

HIVdata2 <- 
  HIVdata %>% 
  select(country = starts_with("Estimated"), starts_with("X"))
#select lets you rename a variable, as we have done here with country = and called the thing starting with "Estimated" to country
head(HIVdata2, 3)
##                 country X1979 X1980 X1981 X1982 X1983 X1984 X1985 X1986
## 1              Abkhazia    NA    NA    NA    NA    NA    NA    NA    NA
## 2           Afghanistan    NA    NA    NA    NA    NA    NA    NA    NA
## 3 Akrotiri and Dhekelia    NA    NA    NA    NA    NA    NA    NA    NA
##   X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##   X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 X2010
## 1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA  0.06  0.06
## 3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##   X2011
## 1    NA
## 2  0.06
## 3    NA

We need to tell R the data table to work with. We need to tell R which variables get repeated and which get `tipped’ from wide to long. Computer scientists call these key-value pairs.

require(tidyr)
## Loading required package: tidyr
HIVdata3 <-
  HIVdata2 %>% gather( year, HIV.perc, -country)
head(HIVdata3, 3)
##                 country  year HIV.perc
## 1              Abkhazia X1979       NA
## 2           Afghanistan X1979       NA
## 3 Akrotiri and Dhekelia X1979       NA

notice that we have a - sign in front of the country because we want top tip everything except the country column.

But we still need to remove the X from the years; to do that we use the command extract_numeric() which is a transformation function.

HIVdata4 <-
  HIVdata3 %>% mutate( year = extract_numeric(year))
head(HIVdata4, 3)
##                 country year HIV.perc
## 1              Abkhazia 1979       NA
## 2           Afghanistan 1979       NA
## 3 Akrotiri and Dhekelia 1979       NA

We want to graph the data now:

HIVdata4 %>%
  filter(country %in% c("South Africa", "Kenya", "Uganda", "Botswana","Malawi", "Tanzania", "United States")) %>%
  #Notice here that we only do data from 198 onwards
  filter(year > 1988) %>%
  ggplot(aes(x = year, y = HIV.perc, color = country)) +
  geom_line(size=2, alpha=0.5) 
## Warning: Removed 7 rows containing missing values (geom_path).

Importing Data

Importing a .csv

You can import data either with a csv or with a url of a csv. For example, we could import: http://dtkaplan.github.io/CVC/Summer2015/Learn/Wrangling/Activity.csv To remove an object (rather than using the broom icon to sweep them away), you can type rm(ObjectName) e.g. rm(Activity).

Importing from Excel

We need to use the package readxl.

require(readxl)
Gender <- read_excel("Gender.xls", skip = 2)
#glimpse(Gender[, 1:10])
head(Gender, 3)
## Source: local data frame [3 x 59]
## 
##   Country Name Country Code
## 1        Aruba          ABW
## 2        Aruba          ABW
## 3        Aruba          ABW
## Variables not shown: Indicator Name (chr), Indicator Code (chr), 1960
##   (dbl), 1961 (dbl), 1962 (dbl), 1963 (dbl), 1964 (dbl), 1965 (dbl), 1966
##   (dbl), 1967 (dbl), 1968 (dbl), 1969 (dbl), 1970 (dbl), 1971 (dbl), 1972
##   (dbl), 1973 (dbl), 1974 (dbl), 1975 (dbl), 1976 (dbl), 1977 (dbl), 1978
##   (dbl), 1979 (dbl), 1980 (dbl), 1981 (dbl), 1982 (dbl), 1983 (dbl), 1984
##   (dbl), 1985 (dbl), 1986 (dbl), 1987 (dbl), 1988 (dbl), 1989 (dbl), 1990
##   (dbl), 1991 (dbl), 1992 (dbl), 1993 (dbl), 1994 (dbl), 1995 (dbl), 1996
##   (dbl), 1997 (dbl), 1998 (dbl), 1999 (dbl), 2000 (dbl), 2001 (dbl), 2002
##   (dbl), 2003 (dbl), 2004 (dbl), 2005 (dbl), 2006 (dbl), 2007 (dbl), 2008
##   (dbl), 2009 (dbl), 2010 (dbl), 2011 (dbl), 2012 (dbl), 2013 (dbl), 2014
##   (dbl)

We also need to be able to save objects like Gender to an Rdata file. save(Gender, file = "Gender.Rdata"). I didn’t put this in a chunk because I didn’t want to have R run this command.

A function saverds() can only save one object, saves it in an unnamed form, but you name it when you load it.

Teaching Tips

These are tips I picked out (thus idiosyncratic):

Commonplace Book